Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach

نویسنده

  • Jean-François Boulicaut
چکیده

Inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a query language, powerful enough to perform all the required elaborations, such as data preprocessing, pattern discovery and pattern postprocessing. We present a synthetic view on important concepts that have been studied within the cInQ European project when considering the pattern domain of itemsets. Mining itemsets has been proved useful not only for association rule mining but also feature construction, classification, clustering, etc. We introduce the concepts of pattern domain, evaluation functions, primitive constraints, inductive queries and solvers for itemsets. We focus on simple high-level definitions that enable to forget about technical details that the interested reader will find, among others, in cInQ publications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Mining Maximal Frequent Itemsets Based on Genetic Algorithm

We present a new approach based on Genetic Algorithm to generate maximal frequent itemsets from large databases. This new algorithm called GeneticMax is heuristic which mimics natural selection approaches to finding maximal frequent itemsets in an efficient way. The search strategy of this algorithm uses lexicographic tree that avoids level by level searching, which finally reduces the time req...

متن کامل

Post-mining: maintenance of association rules by weighting

This paper proposes a new strategy for maintaining association rules in dynamic databases. This method uses weighting technique to highlight new data. Our approach is novel in that recently added transactions are given higher weights. In particular, we look at how frequent itemsets can be maintained incrementally. We propose a competitive model to ‘promote’ infrequent itemsets to frequent items...

متن کامل

CHARM: An Efficient Algorithm for Closed Itemset Mining

The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It ...

متن کامل

Constraint-Based Discovery and Inductive Queries: Application to Association Rule Mining

Recently inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. Querying these databases needs for primitives to: (1) select, manipulate and query data, (2) select, manipulate and query “interesting” patterns (i.e., those patterns that satisfy certain constraints), and (3) cross over patterns and data (e.g., selecting the data in which so...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004